RUSH: Balanced, Decentralized Distribution for Replicated Data in Scalable Storage Clusters
نویسندگان
چکیده
Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a decentralized algorithm, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH distributes objects to servers evenly, redistributing as few objects as possible when new servers are added or existing servers are removed to preserve this balanced distribution. It guarantees that replicas of a particular object are not placed on the same server, and allows servers to have different “weights,” distributing more objects to servers with higher weights. The algorithm is very fast, and scales with the number of server groups added to the system. Because there is no central directory, clients can compute data locations in parallel, allowing thousands of clients to access objects on thousands of servers simultaneously.
منابع مشابه
Efficient, balanced data placement algorithm in scalable storage clusters
Data distribution and load balancing become increasingly important in large-scale distributed storage system. This paper focuses on the problem of designing an optimal, self-adaptive strategies for balanced distribution and reorganization of replicated objects among a dynamically heterogeneous nodes, and presents a novel decentralized algorithm, Dynamic Interval Mapping, which maps replicated o...
متن کاملRDIM: A Self-adaptive and Balanced Distribution for Replicated Data in Scalable Storage Clusters
As storage systems scale from a few storage nodes to hundreds or thousands, data distribution and load balancing become increasingly important. We present a novel decentralized algorithm, RDIM (Replication Under Dynamic Interval Mapping), which maps replicated objects to a scalable collection of storage nodes. RDIM distributes objects to nodes evenly, redistributing as few objects as possible w...
متن کاملStronger Semantics for Low-Latency Geo-Replicated Storage
We present the first scalable, geo-replicated storage system that guarantees low latency, offers a rich data model, and provides “stronger” semantics. Namely, all client requests are satisfied in the local datacenter in which they arise; the system efficiently supports useful data model abstractions such as column families and counter columns; and clients can access data in a causallyconsistent...
متن کاملA Scalable Replica Management Method in Peer-to-Peer Distributed Storage Systems
Large numbers of replicas in peer-to-peer distributed storage systems deteriorate inconsistency and load imbalance. According to those data management problems, a scalable replicas management method based on decentralized and unstructured peer-to-peer network is proposed. Replicas are partitioned into different hierarchies and clusters according to single replica replication, and then replicas ...
متن کاملThe Quest for Balancing Peer Load in Structured Peer-to-Peer Systems
Structured peer-to-peer (P2P) systems are considered as the next generation application backbone on the Internet. An important problem of these systems is load balancing in the presence of non-uniform data distributions. In this paper we propose a completely decentralized mechanism that in parallel addresses a local and a global load balancing problem: (1) balancing the storage load uniformly a...
متن کامل